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ABSTRACT 


A  generic  architecture  for  neural  net  multisensor  data  fusion  is  introduced  and 
analyzed.  The  architecture  consists  of  a  set  of  independent  sensor  neural  nets,  one 
for  each  sensor,  coupled  to  a  fusion  net.  Each  sensor  is  trained  (from  a  represen¬ 
tative  data  set  of  the  particular  sensor)  to  map  to  a  hypothesis  space  output.  The 
decision  outputs  from  the  .sensor  nets  are  used  to  train  the  fusion  net  to  an  overall 
decision.  In  this  report  the  sensor  fusion  architecture  is  applied  to  the  stochastic 
exclusive-or  problem  for  a  benchmark  comparison  with  classical  hypothesis  testing. 
The  architecture  is  also  applied  to  a  data  fusion  experiment  involving  the  multi¬ 
sensor  observation  of  object  deployments  during  the  recent  Firefly  launches.  The 
deployments  were  measured  simultaneously  by  X-  and  L-band  and  CO2  laser  radars. 
The  range-Doppler  images  from  the  X-band  nnd  CO2  laser  radars  were  combined 
with  a  passive-IR  spectral  simulation  of  the  deployment  to  form  the  data  inputs 
to  the  neural  sensor  fusion  system.  The  network  was  trained  to  distinguish  pre¬ 
deployment,  deployment,  and  postdeployment  phases  of  the  laui  a  based  on  the 
fusion  of  tt  se  sensors.  The  success  of  the  system  in  utilizing  sensor  synergism  for 
an  enhanced  deployment  detection  is  clearly  demonstrated. 
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1.  INTRODUCTION 


A  highly  successful  intuitive  architecture  for  hypothesis  testing  from  fused  multisensor  data 
consists  of  distributed  single-sensor  processors  coupled  to  a  fusion  processor  for  an  overall  decision. 
Each  single-sensor  processor  outputs  a  decision  based  only  on  the  individual  sensor  data,  which 
forms  the  input  to  the  fusion  processor.  Optimal  signal  processing  in  a  distributed  sensor  envi¬ 
ronment  based  on  statistical  estimation  and  hypothesis  testing  techniques  has  been  considered  in 
Tenney  and  Sandell  [1],  Sadjadi  [2],  Chair  and  Varshney  [3],  Thomopoulis  et  al,  [4],  Atteson  et  al. 
[5],  Reibman  and  Nolte  [6],  cind  Dasarathy  [7).  As  with  any  Bayesian  approach  to  hypothesis  test¬ 
ing,  optimum  tests  for  data  fusion  are  a  function  of  the  probability  distributions  of  the  input  data. 
The  design  of  such  tests  often  involve  an  assumed  model  for  the  observed  phenomena  to  define  the 
data  distributions.  Alternatively,  data-adaptive  hypothesis  testing  results  in  a  test  based  only  on  a 
previously  generated  training  data  set  [8].  The  outcome  of  a  data-adaptive  test  is  estimated  from 
the  system  performance  on  a  performance  set  of  generated  data  with  known  hypotheses.  The  aver¬ 
aged  system  performance  is  simply  obtained  by  applying  the  testing  to  an  ensemble  of  training  and 
performance  sets.  A  theoretical  treatment  of  data-adaptive  hypothesis  testing,  with  performance 
estimates  based  on  the  statistics  of  the  training  set,  is  given  in  Levine  and  Khuon  [9].  It  should  be 
emphasized  that  data-adaptive  hypothesis  testing,  while  avoiding  an  assumed  model  for  the  data, 
requires  a  representative  training  set  for  successful  definition  of  the  test. 

This  report  applies  a  particular  data-adaptive  hypothesis  test,  the  neural  net,  to  the  dis¬ 
tributed  sensor  fusion  architecture.  Relative  to  the  now-conventional  neural  net  taxonomy  (8,1U|, 
only  mapping  neural  networks  such  as  the  multilayer  perceptron  [11]  and  back  propagation  [12-16] 
are  considered.  These  nets  differ  fiuiii  the  association  Hopfield-type  [17,18]  by  applying  super¬ 
vised  learning  (adaption)  toward  the  perlormance  of  a  functional  mapping  without  feedback  [8]. 
In  hypothesis  testing  the  desired  map  is  from  the  input  data  space  to  an  output  hypothesis  space. 
Alternative  neural  net  architectures,  such  as  those  employing  Kohonen  learning  [8,19],  attempt  to 
store  data  distributions  internally  rather  than  directly  performing  the  data  input-hypothesis  space 
output  mapping.  It  has  generally  been  found  that  neural  net  classifiers  perform  as  well  as  con¬ 
ventional  techniques  on  a  variety  of  problems,  including  linear,  Gaussian,  and  k-nearest  neighbor 
algorithms  [1C, 20-24].  More  generally,  neural  nets  have  been  configured  to  perform  the  maximum 
a  posteriori  probability  [25]  and  maximum  likelihood  tests  [26]  for  arbitrary  input  distributions. 

Figure  1  is  a  generic  architecture  for  distributed  muitisensor  neural  net  data  fusion  consisting 
of  a  sensor  neural  net  (SNN)  for  each  detector  simultaneously  observing  a  stochastic  phenomena. 
Each  SNN  is  trained  to  the  output  decision  space  {Hi, . . . ,  Hq]  from  a  training  set  consisting  of 
the  corresponding  sensor  data.  Th'’  output  of  the  SNN  consists  of  a  normalized  vector  (ai, . . . ,  qq) 
w'here  the  largest  a,  determines  the  hypothesis  Hi-  After  all  SNNs  are  trained,  an  independent  data 
set  is  propagated  through  the  SNNs  to  form  an  input  training  set  for  the  fusion  neural  net  (FNN). 
The  FNN  input  consists  of  an  analog  Q  x  M  vector,  corresponding  to  Q  decisions  for  eeich  of  M 
sensors.  The  FNN  output  consists  of  the  vector  (/i,...,/q),  such  that  the  largest  fi  implies  an 
overall  system  decision  for  hypothesis  if,.  Note  that  the  FNN  performs  cluster  analysis  in  the  QM 
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Figure  1.  Generic  neural  net  sensor  fusion  architecture  for  distributed  sensor  processing. 


dimensional  input  space,  which  for  hypothesis  //<  is  clustered  to  the  vector  (0, , 
for  each  of  M  sensors. 


,0,...,0) 


To  motivate  the  neural  net  sensor  fusion  architecture  in  Figure  1,  the  system  is  applied  to  a 
problem  for  which  a  classical  test  is  formulated.  Section  2  discusses  the  neural  net  detection  of  a 
transition  in  the  standard  deviation  of  Gaussian  noise.  The  process  standard  deviations  before  and 
after  the  supposed  transition  are  assumed  to  be  sensor  dependent.  The  input  to  the  SNNs  consist  of 
windowed  sample  variances  from  before  and  after  the  (supposed)  transition.  The  mapping  is  from  a 
distributed  pair  (xii  Xr)  to  a  decision  space  output  (1,0)  for  transition  and  (0,1)  for  no  transition. 
Figure  2  is  a  schematic  of  the  transition  test  mapping,  which  i?  d-'^otel  SXOR  (for  stochastic 
exclus've-or).  It  is  easily  shown  that  the  tost  requires  a  classifier  bilinear  in  xi  and  X2i  Vv-hich  is 
implemented  by  a  second  order  neural  nel  cilgorithm  jll].  In  addition  to  requiring  a  nontrivial 
neural  net,  the  variance  tran.sition  problem  is  sufficiently  tractable  to  allow  an  analytic  solution  for 
the  classical  test  performance.  False  alarm  and  detection  probabilities  are  expressed  in  terms  of  the 
threshold  parameter  u.sed  in  the  liypothesis  tt;st.  An  optimum  threshold,  corresponding  roughly 
by  definition  to  maximum  detection  and  locally  minimum  false  alarm  probabilities,  is  computed 
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Figure  2.  Schematic  of  variance  transition  test  mappino:  stochastic  exclusive-or 
(SXOR).  Input  N-sample  window  variances 


for  a  number  of  'Hfferent  noise  and  sampling  conditions.  Section  2  compares  the  performance  of 
the  sensor  and  fusion  neural  nets  to  the  classical  test  probabilities.  Chair  and  Varshney  [3]  .show 
that  optimum  data  fusion  is  implemented  by  a  linear  combination  of  the  SNN  decision  outputs 
followed  by  the  application  of  a  threshold.  The  optimum  weight  vector  for  the  linear  combination 
is  a  function  of  the  performance  probabilities  of  the  SNNs.  This  fusion  algorithm  is  equivalent  to 
a  first-order  perceptron  for  which  the  weight  vector  can  be  adapted  by  the  perception  learning 
algorithm  [5,11!-  Ihe  performance  of  the  neural  net  sensor  fusion  system  on  the  SXOR  problem 
with  back  propagation  SNNs  and  the  optimum  perceptron  FNN  are  also  discussed  in  Section  2. 
Motivated  by  the  fact  that  the  optimum  fusion  algorithm  is  a  neural  net,  a  back  propagation  FNN 
was  trained  on  the  SNN  outputs.  Both  the  optimum  and  back  propagation  FNNs  matched  or 
exceeded  the  higher  performing  SNN  in  the  data  fusion,  justifying  the  use  of  neural  networks  in 
the  distributed  sensor  fusion  system. 

Section  3  applies  tho  fusion  system  architecture  shown  in  Figure  1  to  the  detection  of  object 
deployments  during  the  Firefly  (FF)  launches  that  occurred  on  29  March  (FFI)  and  20  October 
1990  (FFII)  from  Wallops  Island,  Virginia  (as  depicted  in  Figure  3).  The  launches  presented 
a  rare  opportunity  for  data  fusion  due  to  the  simultaneous  observation  by  the  three  Millstone 
Hill  (Westford,  Ma-ssachusetis)  radars:  the  Haystack  X-band  imaging,  Firepond  CO2  la.ser,  and 
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Figure  3.  Firefly  experiment  launch  sequence:  Phases  /  (predeployment),  II  (deploy¬ 
ment),  and  III  (poatdeployment)  for  canister-payload  and  balloon-canister. 


Millstone  L-band  tracking.  In  applying  the  sensor  fusion  architecture  to  the  FF  data,  two  back 
propagation  SNNs  were  trained  on  the  deployments  using  range-Doppler  images  derived  from  the 
Haystack  X-band  and  Firepond  COj  laser  radar  data.  A  third  SNN  had  as  input  the  passive-IR 
spectral  simulation  of  the  deployments  consisting  of  the  spectral  irradiance  of  the  objects  in  the 
range  [5  n,  25  p].  The  range-Doppler  images  contained  information  of  object  segmentation,  whereas 
the  passive-IR  simulation  was  sensitive  to  changes  in  the  exposed  object  material  composition.  The 
sensor  fusion  system  neural  output  consisted  of  a  decision  among  the  possibilities  of  predeployment 
(1,0,0),  deployment  (0,1,0),  and  postdeployment  (0,0,1).  In  accordance  with  Figure  1,  the  FNN 
had  nine  inputs,  three  sensors  with  three  possible  decisions,  and  three  output  neurons  for  an  overall 
deployment  decision.  The  system  is  applied  to  deployment  detection  of  an  inflated  balloon  with 
training  and  performance  data  sets  from  the  same  launch  (FFI).  The  performance  of  the  entire 
sensor  fusion  system  is  compared  to  that  of  the  SNN  for  each  sensor  to  observe  evidence  of  sensor 
synergism  through  data  fusion.  I’he  application  of  the  system  to  a  canister  deployment  detection, 
in  which  the  training  and  performance  sets  were  taken  from  different  launches  (FH  and  FFII, 
respectively),  is  also  discussed. 

Sections  2  and  3  contain  the  systematic  theoretical  and  experimental  analyses  of  neural  net 
processing  in  the  increasingly  relevant  distributed  sensor  environment.  A  conclusion  follows  in 
Section  4,  and  the  appendix  contains  the  Bayesian  analysis  of  the  SXOR  test. 
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2.  SXOR  BENCHMARK  FOR  NEURAL  NET  DATA  FUSION 


This  section  considers  a  quantitative  comparison  of  neural  net  and  classical  hypothesis  testing 
in  the  distributed  sensor  architecture.  The  SXOR  test  map  is  interesting  because  it  requires  a 
nontrivial  neural  net  implementing  at  least  a  second-order  classifier  and  yet  is  mathematically 
tractable.  In  addition,  the  detection  of  noise  deviation  transitions  reflects  a  co-  non  situation  in 
nonstationary  signal  processing  [27]. 

2.1  False  Alarm  and  Detection  Probability  for  SXOR 

False  alarm  and  detection  probabilities  are  related  to  the  threshold  parameter  of  the  SXOR 
test  and  the  properties  of  the  noise  sampling.  The  sufficient  statistic  for  a  zero  mean  Gaussian 
process  {yi\i  =  1, . . . ,  A'}  is  the  sample  variance  [28] 

~  y)‘’  (1) 

1=1 

where  y  is  the  mean.  The  sample  variance  is  distributed  with  a  probaoility  density 


p(x)  = 


‘exp(-^) 


where  cr  is  the  standard  deviation  of  the  Gaussian  random  process  {y,},  and  F  '.s  the  gamma 
function.  The  classic  test  of  distinguishing  between  two  deviations,  cto  and  <ti,  results  from  a 
threshold  7;  x  greater(less)  than  7  implies  noise  deviation  cti(i7o). 

The  computation  of  performance  probabilities  for  the  SXOR  requires  the  conditional  probabil¬ 
ities  {p[(r,  j)|(9,m)J|/,  1}}  where  each  pair  (i,i)  corresponds  to  a  (before, after)  variance 

condition.  The  index  i  of  0  or  1  denotes  a  windowed  sample  variance  from  a  low  (uo)  or  high  (cri) 
deviation  process,  respectively.  The  conditional  probability  p[(f,  j)|(9,  m)]  represents  the  detection 
of  a  noise  condition  (i,  j)  when  the  (before, after)  window’s  truly  correspond  to  the  condition  {q,m). 
The  hypothesis  is  tested  on  two  data  windows  of  length  N  from  before  and  after  the  supposed  vari¬ 
ance  transition.  Assuming  independent  tests  on  each  window,  the  conditional  probabilities  factor 
according  to  the  equation  p[(i,  j)l(9, m)]  =  p{i\q)p{j\Tn),  where  p{j\m)  denotes  the  probability  of 
choosing  noise  deviation  dj  for  a  single  window  with  deviation  Om-  The  pair  of  decisions  necessary 
to  determine  a  transition  is  based  on  the  value  of  x  in  Equation  (1)  '’or  two  data  windows  and  the 
threshold  7  (as  described  above). 
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The  appendix  relates  the  false  akrin  and  detection  probabilities  for  variance  transition  de¬ 
tection  to  the  conditional  probabilities  on  a  single  window  p(jiiTn).  The  conditional  probabilities 
for  the  transition  hypothesis  test  are  showr^  to  be  given  by 

Pj  =  p(transition|  transition)  =  p(l|l)p(0|0)  +  p(0|l)p(l|0)  (3) 

and 

Pf  =  p(transition|n&  transition)  =  p(l|l)p(0|l)  -I-  p(110)p(0|0),  (4) 

where  it  is  assumed  that  the  four  possible  noise  conditions  je{0, 1}}  have  equal  prior 

probability.  The  conditional  probabilities  appearing  in  Equations  (3)  and  (4)  are  given  by 


roo 

p(l|r)  =  /  Pi{x)dx  (5) 

J-y 

and 

p(0|i)  =  /  ■pi{x)dx,  (6) 

Jo 

where  p;  is  the  function  p(x)  in  Equation  (1)  with  a  equal  to  crj. 

As  the  threshold  is  varied,  the  behavior  of  Pd  and  Pf  in  Equations  (3)  and  (4)  characterizes 
the  hypothesis  test  [28],  which  for  this  problem  is  the  determination  of  a  high/low  or  low/high 
variance  transition.  The  test  is  a  stochastic  version  of  the  binary  exclusive-or  map,  which  has 
historically  been  important  in  neural  net  research  [11,12].  The  central  importance  of  this  map 
derives  from  the  concept  of  linear  separability  [11].  Embedding  the  input  sample  variances  ()(;i,X2) 
to  a  higher  dimensional  space  (xi, '  27  X1X2)  enhances  the  linear  separability  of  the  x^  distributed 
input  data  distributions.  This  fact  suggests  that  the  transition  detection  classifier  is  bilinear  in  the 
input  pair  <^xi7X2)  and,  therefore,  that  the  perceptron  realization  of  the  map  is  necessarily  second 
order  [li^.  It  is  emphasized  that  intermediate  single- window  variance  decisions,  high  or  low,  are 
not  performed  in  the  test  so  that  the  map  is  diflFerent  from  the  conventional  Gaussian  classifier. 

Figures  4  and  5  plot  false  alarm  and  detection  probability  as  a  function  of  threslrold  7  for  ctq 
of  one  and  cri  of  two  and  fqiu:,  respectively.  The  conditional  probabilities  wore  derived  for  various 
window  lengths  N  by  numerical  computation  of  Equations  (3)  through  (6).  Note  that  for  deviation 
Cl  and/c  window  of  sufficient  size  N,  the  p<  ak  of  the  detection  probability  occurred  near  the  local 
minimum  in  the  false  alarm  probability.  The  experimental  results  for  neural  net  performance  on 
the  SXOR  problem  indicate  convergence  to  this  region  of  peak  detection  and  locally  minimum  false 
alarm  probability. 
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Figure  4-  False  alarm  and  detection  probability  versus  threshold ')  for  SXOE  test,  with 
(To  =  1,  (Ti  —  2,  and  N  -  (a)  2,  (b)  6,  (c)  10,  and  (d)  20. 


2.2  Back  Propagation  Neural  Net  Performance 

Figure  6  is  a  back  propagation  neural  net  suitable  for  hypothesis  testing  on  an  input  P-vector 
of  data-derived  parameters.  The  desired  output  for  an  input  vector  corresponding  to  hypothesis 

t 

Hi,  t  =  1, . . . ,  Q  is  the  vector  (0, . . . ,  0,  1  , 0. . . . ,  0)  as  obtained  from  the  Q  output  (deepest  layer) 
neurons.  In  addition  to  the  input  and  output  neuron  layers,  the  back  propagation  net  contains 
so-called  “hidden”  layers.  The  adjustable  parameters  on  the  net  consist  of  a  threshold  for  every 
neuron  in  the  net  and  connection  weights  between  neurons  on  adjacent  layers  [12].  During  forward 
propagation  (left  to  right)  a  neuron  with  threshold  9  applies  the  sigmoid  function 


/(/)- 


1  +  exp(-/  -I-  9) 


CO 
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Figure  5.  False  alarm  and  detection  probability  versus  threshold  7  for  SXOR  test,  with 
<7o  =  1,  (7i  =  4,  and  N  =  (a)  2,  (b)  6,  (c)  10,  and  (d)  20. 


to  the  input  I  consisting  of  the  weighted  sum  of  the  neuron  outputs  from  the  leftward  adjacent 
layer.  Net  adaption  consists  of  varying  the  connection  weights  and  thresholds  until  the  output  of 
the  deepest  layer  neurons  matches  the  desired  output  for  all  elements  of  the  training  set.  Details 
of  the  back  propagation  algorithm,  which  is  derived  from  the  gradient  descent  minimization  of  the 
difference  between  net  output  and  target  over  the  training  set,  is  found  in  Rumelhart  et  al.  [12]. 

It  has  been  shown  that  a  three- layer  back  propagation  net  is  sufficient  to  implement  any 
reasonable  functional  mapping  between  input  and  output  vectors  [29j.  Note  from  Equation  (7)  that 
an  undulation  of  the  mapping  is  realisable  by  the  equation  (/(/)  —  /(/ + A))  for  a  constant  threshold 
A.  This  roughly  suggests  that  two  middle  layer  neurons  are  required  for  each  oscillation  in  the  map; 
however,  performance  at  a  Bayesian  optimum  is  not  guaranteed  by  a  network  that  performs  an  exact 
mapping  for  every  element  in  a  stochastic  training  set  [9].  To  test  the  performance  of  the  back 
propagation  algorithm  on  the  SXOR  map,  a  net  with  2  input  neurons,  16  middle  layer  neurons, 
and  2  output  neurons  [Q  —  2)  was  used.  The  training  rate  and  smoothing  parameters  (17  cind  a 
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Figure  6.  Back  propagation  neural  net  for  hypothesis  testing:  P-vector  input  data  and 
Q-vecior  hypothesis  output,  (0, . . . ,  0, 1, 0 . 0)  i — *  if,. 


in  Rumelhart  [12])  were  chosen  to  be  0.5  and  0.2  by  experimentation  with  various  input  sets.  The 
input  consisted  of  sample  variances  from  a  training  set  of  Gaussian  random  noise  segments  with  ctq 
of  one  and  cri  of  either  two  or  four.  The  sample  variances  were  computed  from  windows  of  length 
N  given  by  2,  4,  6,  8,  10,  and  15.  For  each  noise  pair  (ao  and  ai)  and  window  N,  two  training 
ensembles  each  of  sizes  400,  800,  and  1200  were  created  with  deviation  pairs  in  the  order  (1,1), 
(1,0),  (0,0),  and  (0,1).  The  two  third-layer  neurons  were  trained  to  output  values  1  and  0  for  the 
(1,1)  and  (0,0)  inputs,  and  the  output  targets  were  reversed  for  input  corresponding  to  (1,0)  and 
(0,1).  The  cost  function  C,  consisting  of  the  summed  differences  of  third-layer  outputs  and  targets, 
was  monitored  during  training  to  determine  a  point  beyond  which  it  did  not  decrease.  Figure  7  is 
a  typical  cost  versus  iteration  curve  for  a  100-element  training  set  with  ai  of  4  and  window  length 
N  of  10.  Also  included  is  the  so-called  “Hamming  error”  versus  iteration  plot,  which  is  defined  as 
the  number  of  decision  errors  (within  1%)  over  the  training  set.  As  suggested  in  Hecht-Nielsnn  [8], 
nets  were  trained  for  a  large  number  ( >  30, 000)  of  iterations  (defined  as  a  single  adaption  of  all 
net  parameters  for  every  element  in  the  training  set)  and  the  point  of  minimum  cost  was  chosen 
as  the  optimum.  The  implementation  of  the  desired  training  set  map,  corresponding  to  C  — >  0, 
was  often  not  attained  with  16  middle  layer  neurons;  however,  the  Bayesian  optimum  w'as  obtained 
through  the  net  learning  of  data  biases  rather  than  each  undulation  in  the  training  set  map  [9]. 
/  network  wit!  too  many  hidden  layer  neun.ns  often  had  plateaus  in  the  cost  function  in  Figure 
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Figure  7.  Cost  versus  iteration  curve  for  back  propagation  learning  of  the  SXOR  map;  S 
input,  16  hidden  layer,  and  2  output  neurons  in  a  100-element  training  set;  (To  =  <^1  =  4, 

N=rlO. 


7,  which  was  probably  due  to  the  phenomenon  of  “neuron  paralysis"  [29]  that  occurs  at  a  neuron 
when  the  input  is  at  the  tail  of  the  threshold  function  in  Equation  (7).  In  this  case,  connection 
and  threshold  parameter  adaption  have  little  effect  on  the  neuron  output,  hence  the  cost  function 
remains  constant  [30].^  It  was  found  that,  whereas  extremely  long  training  sometimes  resulted 
in  downward  jumps  in  the  cost  function,  the  network  performance  on  the  test  weis  not  improved. 
Often  the  only  effect  was  an  increeise  in  detection  probability  with  a  simultaneous  increase  in  false 
alarm  probability  and  vice  versa.  A  discussion  of  technitiues  to  avoid  neuron  paralysis  and  other 

/-»V»c+»»/-'loe  Jo  W?»«300»*fTt  !*r»  [*^11 

iiCUXOi  AkJ  ^ O  V  i  «.A  AAA  «  «  vw.«k>  v^A  AaL»  .>AA 


^On  the  exclusive- or  map  a  64-rieuron  hidden  layer  had  4%  paralyzed  runs  and  a  128-neuron  hidden 
layer  had  78%  paralyzed  runs. 
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For  each  parameter  set  (Tq,  o\  and  window  length  iV,  networks  trained  on  ensembles  of  length 
400,  800,  and  1200  were  performance  tested.  A  performance  set  with  1200  variance  pairs  was  in¬ 
put  to  the  trained  net,  and  for  each  pair  the  largest  neuron  output  determined  whether  transition 
or  no  transition  was  chosen.  The  proportion  of  correctly  and  incorrectly  chosen  transitions  then 
determined  the  detection  and  false  alarm  Pj  probabilities  for  the  test.  The  performance  prob¬ 
abilities  for  nets  trained  on  three  different  sets  (of  length  400,  800,  and  1200)  were  averaged.  The 
combination  of  training  sets  of  different  size  minimized  the  dependence  of  the  network  performance 
estimate  on  training  set  size.  Figures  8  and  9  plot  and  Pf  versus  N  as  estimated  from  the 
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Figure  8.  SXOR  false  alarm  and  detection  probability  versus  window  size  N  for  back 
propagation  NN  and  classical  test:  <To  —  \,(Ti  =  2. 
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Figure  9.  SXOR  false  alarm  and  detection  probability  versus  window  size  N  for  back 
propagation  NN  and  classical  test:  ffn  =  1,  ci  =  4. 
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performance  sets  for  o\  of  two  and  four,  respectively;  the  dotted  curves  correspond  to  the  classical 
optimum  defined  in  Section  2.1.  As  seen  in  Figures  8  and  9,  the  back  propagation  network  closely 
approximated  the  performance  at  the  peak  Pd  and  locally  minimum  Pf  in  F'igures  4  and  5.  This 
behavior  is  understood  by  the  equal  contribution  of  Hq  and  H\  errors  over  the  training  set  in  the 
co.st  function  C  [12]. 

2.3  SXOR  Data  Fusion 

Section  2.2  demonstrated  the  optimum  performance  of  a  back  propagation  neural  net  on  the 
SXOR  test,  which  requires  a  bilinear  classification  of  the  input  sample  variances.  This  network 
corresponds  to  a  forward- based  SNN  in  the  distributed  sensor  fusion  Eirchitecture  in  Figure  1. 
A  description  of  FNN  training  and  performance  by  taking  input  from  two  SXOR-trained  SNNs 
follows.  The  results  indicate  the  enhancement  of  variance  transition  performances  obtained  through 
distributed  sensor  data  fusion. 

In  Chair  and  Varshney  [3],  an  optimum  data  fusion  rule  for  a  binary  decision  was  obtained 
within  the  distributed  sensor  processing  architecture.  As  derived  from  the  log-likelihood  ratio  test, 
assuming  sensor  processor  i,  i  ~  1, ,  A/,  outputs  Uj  of  —  1  or  -f-1  for  decision  Nq  or  i?i,  the  data 
fusion  rule  is  [3] 


-f-1  if  ao  +  52i^  “I'Ui  >  0 
- 1  otherwise. 


(8) 


where  the  coefficients  Cj,  z  =  1 , . . . ,  A/,  are  given  by 
ai  ~  log 

and 

D  1  ^ 

ao  =  \og-^  + 

with  Po  and  Pi  the  prior  probabilities  of  Hq  and  Hi,  and  Pm,  and  Pf^  the  miss  and  failse  alarm 
probabilities  of  the  ith  sensor  processor.  The  architecture  implied  by  Equations  (8)  through  (10)  is, 
in  fact,  a  first-order  perceptron  [5,11]  that  can  be  realized  through  the  adaption  of  the  connection 
weights  ai,i  =  0,  ...,A/,  by  training.  To  implement  perceptron  learning  for  input  Uj  =  ±1,  i  — 
define  the  normalized  predicate  vector  $  =  (l,ui, . . .  ,um)/VM'+1  and  coimection 
weight  (M  +  1)- vector  A  =  (ao, . . . ,  um)-  After  a  training  set  element  input  vector  is  propagated 
through  each  SNN,  and  the  SNN  decisions  are  determined  by  the  largest  neuron  outputs,  the  dot 


Pm.i^  Pmj) 


(10) 


(1-P,„J(1-P/.) 


PrUi  Pfi 


(9) 
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product  $-i4  is  computed.  In  the  case  of  a  correct  FNN  decision,  >  0(<  0)  for  $  corresponding 
to  Hi  {Ho),  the  connection  weight  vector  is  not  changed  {A'  —  A).  For  an  incorrect  FNN  decision 
the  connection  weight  vector  is  altered  by  tlie  normalized  predicate  vector,  A'  =  A±  4,  where  -h(— ) 
corresponds  to  $  •  >1  <  0  (>  0)'for  i  corresponding  to  Hi  (Hq).  An  iteration  of  the  perceptron 
adaption  algorithm  consists  of  the  application  of  the  above  algorithm  for  every  element  in  the 
training  set  [11].  Training' continues  until  the  FNN  performs  a  correct  decision  for  the  entire  set  or 
until  the  FNN  performance  does  not  improve. 

The  architecture  for  the  fusion  of  two  SXOR-trained  back  propagation  SNNs  is  shown  in 
Figure  10.  It  is  assumed  that  the  high  noise  deviation  cri  is  sensor-dependent,  so  that  each  SNN 


Figure  10.  Fusion  architecture  for  SXOR  test:  SNNl  (oq  —  ho"!  =  2J,  SNN2  (ao  = 
li<ri  =  4j,  and  FNN  first-order  perceptron. 


was  previously  trained  on  a  different  variance  pair  o-q  {—  1)  and  ai.  For  each  window  size  N 
{=  2,  4,  6,  8,  10)  a  pair  of  SNNs  was  trained  on  sample  variances  with  ai  of  two  and  four. 
As  in  the  experiment  de.‘icribed  in  Section  2.2,  the  SNN  target  outputs  were  (1,0)  and  (0,1)  for 
transition  and  no  transition,  respectively.  A  performance  set  of  1000  variance  pairs  each  was 
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used  to  compute  the  SNN  detection  Pj,  false  alarm  Pj,  miss  P^,  and  correct  no  transition  PcHq 
probabilities  for  the  test.  The  SNN  decisions  were  determined  by  the  largest  neuron  output.  A 
plot  of  these  performance  probabilities  for  the  (ci  =  4)  SNN  and  (ci  —  2)  SNN  as  a  function 
of  window  size  N  is  shown  in  Figure  11.  A  determination  of  the  SNN  detection  and  false  alarm 
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Figure  11.  Performance  probabilities  for  the  fusion  of  two  SXOR-trained  SNNs:  Pa,  Pf, 
Pm,  and  PcHo  versus  window  size  N  for  ctj  =  2  and  Oi  —  i  SNNs,  optimum  FNN,  and 
back  propagation  FNN. 


probabilities  allowed  the  definition  of  an  optimum  perception  FNN  from  Equations  (8)  through 
(10).  An  estimate  of  the  perceptron  FNN  performance  was  obtained  with  1000  variance  quartets 
((f>  j)t  (^^  jO)t  i)  1}.  Quartet  ((*,;),(»',/))  corresponds  to  input  variance  pairs  (i,j)  and 

for  the  SNNs  with  ai  of  two  and  four,  respectively.  Recall  that  i  of  1(0)  corresponds  to 
the  choice  of  a  high(low)  noise  deviation  in  the  definition  of  the  sampled  variance.  The  output 


SNN  decision  was  converted  to  Uj  =  ±1,  i  =  1,2  (as  in  Figure  10)  before  input  to  the  perceptron 
FNN.  Figure  11  shows  the  perceptron  FNN  performance  as  a  function  of  TV  as  determined  from 
the  performance  set.  Note  that  the  FNN  matched  the  performance  of  the  (rri  =  4)  SNN  for  N  of 
2,  4,  6,  8,  and  10.  The  (<ri  =  2)  SNN  has  a  small  effect  on  the  optimum  FNN  due  to  the  generally 
poor  performance  of  the  net  (Pm  P/  c;  0.5). 

Motivated  by  the  representation  of  the  optimum  FNN  as  a  perceptron,  a  back  propagation 
FNN  (BPFNN)  was  defined  for  the  data  fusion  of  the  two  SXOR-trained  SNNs.  The  BPFNN  con¬ 
sisted  of  4  inputs  (2  from  each  SNN),  a  16-neuron  hidden  layer,  and  2  output  neurons.  The  BPFNN 
was  trained  on  25  randomly  generated  variance  quartets  in  the  order  ((0,0),(0',0')),  ((l,0),(l',0'))) 
((1,1),(1',1')),  8ind  ((0,1),(0',1')).  Each  variance  quartet  was  propagated  through  the  SNNs  and 
normalized  to  define  the  4-element  input  to  the  BPFNN.  As  in  the  case  of  the  SNNs,  the  BPFNN 
targets  were  (1,0)  and  (0,1)  for  transition  and  no  transition,  respectively.  To  speed  up  training  for 
an  FNN  with  only  16  hidden  neurons,  the  variance  quartets  from  the  overlapped  region  of  the  input 
domain  were  removed  from  the  training  set.  This  procedure  usually  suifices  to  obtain  Bayesian  op¬ 
timum  performance  through  the  learning  of  data  biases  [9).  After  BPFNN  training,  a  performance 
set  of  1000  random  variance  quartets  was  generated  and  propagated  through  the  entire  sensor 
fusion  system.  A  count  of  correctly  and  incorrectly  detected  transitions  and  no  transitions  over 
the  performance  set  determined  the  conditional  probabilities  plotted  in  Figure  11.  Note  that  the 
trained  BPFNN  essentially  matched  the  optimum  FNN  at  the  performance  of  the  (cTi  =  4)  SNN  for 
window  sizes  2  through  10.  These  results  suggest  that  the  trained  distributed  sensor  fusion  system 
attained  at  least  the  performance  of  the  strongest  sensor  at  any  time.  To  demonstrate  performance 
enhancement  through  data  fusion,  the  fusion  of  two  (a\  =  4)  SNNs  trained  on  data  of  window 
length  2  was  considered.  A  three-layer  BPFNN  was  trained  on  the  SNN  pair  outputs  from  100 
input  variance  quartets.  As  in  the  training  above,  variance  quartets  from  the  overlapped  regions  of 
the  input  domain  were  discarded  from  the  training  set.  This  procedure  required  a  BPFNN  training 
time  of  about  30  min  on  the  Silicon  Graphics  Workstation,  The  BPFNN  performance  probabilities 
were  computed  from  100  independent  performance  sets,  each  consisting  of  100  variance  quartets. 
Averaged  BPFNN  performance  probabilities  (Pj,  P/,  Pmi  Pc/fo)  given  by  (0.84,  0.11,  0.16,  0.87) 
were  obtained  for  comparison  wHh  the  (cri  =  4)  SNN  performance  set  (0.7P  0.38,  0.24,  0.62).  In 
applying  the  same  training  and  performance  procedure  to  two  fused  (ai  =  4)  SNNs  with  window 
length  4,  BPFNN  averaged  probabilities  given  by  (0.86,  0.12,  0.14,  0.88)  were  obtained.  These 
values  are  compared  against  a  window  length  4  (<ri  =  4)  SNN  performance  set  of  (0.84,  0.19,  0.16, 
0.81)  and,  therefore,  a  BPFNN  performance  enhancement  of  up  to  70%  over  the  individual  sensor 
nets  was  demonstrated.  The  result.s  in  Section  3,  in  which  neural  net  fusion  is  applied  to  the  FF 
launches,  also  support  this  conclusion. 


3.  FIREFLY  SENSOR  FUSION  EXPERIMENT 


This  section  applies  the  distributed  sensor  fusion  architecture  described  in  Section  2  to  a 
three-sensor  fusion  of  measurements  during  the  recent  FF  launch.  The  experiment,  involving  the 
complicated  logistics  of  three-radar  imaging  and  tracking,  provided  a  rare  opportunity  to  demon¬ 
strate  the  power  of  neural  net  sensor  fusion, 

3.1  Firefly  Experiment 

The  FF  experiment  consisted  of  two  rocket  launches  (FFI  on  29  March  and  FFII  on  20  October 
1990)  from  Wallops  Island  into  the  Atlantic  Ocean  about  400  km  eastward.  During  the  flight  the 
deployment  of  an  inflatable  balloon  was  observed  simultaneously  by  the  threj  Millstone  Hill  radars 
at  a  range  of  approximately  750  km  from  the  targets.  The  active  sensors  were  the  Haystack  X-band 
(A  =  3  cm)  and  Firepond  COg  laser  (A  =  11.2/r)  imaging  radars,  and  the  Millstone  L-band  (A  = 

23.1  cm)  tracking  radar. 

About  6  min  after  the  launch,  a  metallic  canister  (cross  section  ~  1  rn'^)  was  deployed  from 
a  much  larger  metallic  payload.  As  the  payload  fell  away  from  the  track,  the  canister  ejected  four 
metallic  doors  and  an  inflating  carbon  cloth  cone  (cross  section  ~  2  m^).  As  shown  in  Figure  3, 
the  predeploy  merit,  deployment,  and  postdeployraent  phat;es  are  clearly  identified  for  both  canister 
and  balloon-canister  payloads. 

The  input  data  for  the  sensor  fusion  system  consisted  of  range-Doppler  images  from  the 
Haystack  and  Firepond  radars  and  a  passive-IR  spectral  simulation  of  the  objects  in  the  images. 
Radar  imaging  takes  advantage  of  a  moving  target’s  aspect  angle  change  to  obtain  a  signal  Doppler 
shift  proportional  to  the  scatterer  cross-range  extent.  The  Doppler  resolution  is  proportional  to 
the  inverse  of  the  signal  integration  time  over  which  it  is  assumed  that  the  scatterer  has  moved  a 
negligible  distance  and  the  signal  is  coherent.  Through  object  motion  analysis,  the  Doppler  shift  is 
scaled  to  a  physical  cross-range  distance  [32,33].  This  analysis  is  coupled  with  an  estimate  of  the 
range  from  the  signal  delay  to  obtain  a  2D  range-cross-range  image  of  the  object.  The  range-Doppler 
technique  results  in  image  resolution  greater  than  the  limits  imposed  from  the  radar  aperture  and 
radiation  wavelength.  Details  of  range-Doppler  imaging  theory  for  the  Haystack  and  Firepond 
radars  is  given  in  .A.usherman  et  al.  [34]  and  Kachelmyer  [35],  respectively.  The  third  sensor  input  to 
the  sensor  fusion  system  was  from  a  passive-IR  simulation  of  the  objects  in  the  images.  The  Lincoln 
Laboratory -developed  simulator  was  used  to  provide  a  feasibility  study  for  passive-IR  deployment 
detection  [36].  Inputs  to  the  simulator  included  object  shape,  dimensions,  spin/precession  rates,  and 
orientation  relative  to  the  sun.  The  input  thermal  properties  were  initial  temperature,  einissivity, 
interior  etr’ssivity,  absorptance,  thermal  mass  (density  x  heat  capacity),  and  specularity.  Finally, 
a  climate  and  cloud  cover-dependent  model  of  earth  spectral  irradiation  through  the  atmosphere 
was  input.  The  output  from  the  simulator  consisted  of  the  object  spectral  irradiation  into  the  solid 
angle  over  the  range  [5  /x,  25  ^i]  in  Watts/steradians.  The  spectral  irradiation  provided  information 
about  the  material  composition  of  an  object.  In  the  range  [5  /x,  25  /x],  this  information  is  indirect 
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through  estimates  of  relative  emissivity  and  reflectance  at  the  surface.  Thus,  for  example,  a  metallic 
object  with  low  emissivity  and  absorbtance  (c  ~  a  ~  0)  has  a  spectrum  dominated  by  reflected 
earthshine.  As  seen  in  Figure  12,  the  metallic  spectrum  has  notches  at  the  ozone  (~  9.5/i)  and  CO2 


WAVELENGTH  (pm) 


Figure  12.  Passive-IR  simulated  spectrum  over  the  range  [5  p,  25  pj  in  W/sr.  Metallic 
object  with  l-m^  cross  section,  e  —  a  =  0. 


(~  13  p)  wavelengths  due  to  atmospheric  absorption  of  earthshine  as  contrasicd  with  a  graybody 
object  (t  ~  0.75)  in  Figure  13,  in  which  a  classic  blackbody  spectrum  dominates  the  spectral 
irradiance.  Note  from  the  spectra  in  Figures  12  and  13  that  the  graybody  irradiance  is  about  20 
times  the  reflected  component  for  a  1-m^  object,  suggesting  that  the  existence  of  a  graybody  object 
Jiinong  a  set  of  metallic  targets  will  dominate  the  total  spectral  irradiance.  The  balloon-canister 
deployment  sequence  for  the  FFI  launch,  with  the  identification  of  the  predeployment,  deployment, 
and  postdeployment  phases,  is  shown  in  Figure  14.  Figure  15  illustrates  the  passive-IR  simulation 
from  each  phase;  a  reflective  earthshine  canister  spectrum  for  predeployment,  the  superposition  of 
metallic  door  and  carbon  cloth  (graybody)  spectrum  for  deployment,  and  a  graybody  carbon  cloth 
spectrum  for  postdeployment.  These  simulated  spectra  form  the  training  set  for  the  passive-IR. 
SNN  in  the  sensor  fusion  architecture  discussed  in  Section  3.2. 

Figure  16  depicts  the  formulation  of  the  fused  sensor  decision  on  balloon  di,ployrnent  from 
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Figure  13.  Passive-IR  simulated  spectrum  over  the  range  [5  p,  25  nj  in  W/sr.  Graybody 
object  with  l-rrr  cross  section,  e  =  0.75,  a  -  9.9. 


Haystack  and  Firepond  range-Doppler  images  and  a  passive- IR  simulation.  Due  to  the  longer 
Haystack  coherent  integration  time,  the  range  and  cross-range  resolutions  of  the  Firepond  and 
Haystack  radar  were  comparable.  The  most  important  differences  between  the  radar  images  resulted 
from  a  Haystack  beamwidth  about  100  times  that  of  the  Firepond,  which  at  7.5  m  at  750  km  was 
sufficient  to  observe  only  single  targets  in  a  complex  scene,  whereas  the  Haystack  radar  observed 
a  much  larger  cross-range  extent.  It  should  be  emphasized  that  these  Firepond  properties  are 
beneficial;  that  is,  a  shorter  integration  time  allows  more  rapid  image  generation  (~  3000  times 
faster)  and  a  narrow  beam  is  more  difficult  to  detect. 


As  seen  in  Figure  16,  during  the  predeployment  phase  (of  about  24  s)  Firepond  images  con¬ 
sisted  of  only  the  metallic  canister,  whereas  Haystack  iuiages  contained  returns  from  the  separating 
payload.  The  passive-IR  spectrum  was  weak  (~  0.6  W/sr  peak)  and  earthshine-dominated  with 
notches  at  9.5  and  13  /i.  During  the  2-s  deployment  phase  the  cross-range  velocity  component  of 
the  ejected  doors  resulted  in  a  rapid  lo.'^'  of  images  for  Firepond.  Two  of  the  doors  moved  roughly  in 
parallel  to  the  inflating  balloon  so  that  throughout  the  deployment  the  Haystack  images  consisted 
of  the  decoy  and  nearby  doors  represented  in  Figure  16.  Note  from  Figure  15  that  in  the  passive-IR 
deployment  spectra  the  balloon  graybody  radiation  dominated  the  structure  in  the  earthshine  spec¬ 
trum  from  the  doors.  The  postdeployment  phase  of  30  s  was  determined  from  the  Firepond  images 
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Figure  14-  Balloon-canister  deployment  sequence  for  FJ'l  launch:  vredeployment,  de¬ 
ployment,  and  postdeployment  phases. 


of  an  inflated  carbon  cloth  cone,  Haystack  images  of  the  balloon  and  two  sufficiently  separated 
metallic  doors,  and  a  passive-IR  carbon  cloth  graybody  spectrum.  The  data  represented  in  Figure 
16  were  input  to  the  sensor  fusion  system  described  in  Section  2  for  a  decision  of  predeployment, 
deployment,  and  postdeployment  phases.  The  irreducible  ambiguities  inherent  in  the  single  sensor 
data  are  also  observed  in  Figure  16.  The  passive-IR  sensor  discrimination  between  deployment 
and  postdepioyment  was  weak  due  to  the  graybody  domination  of  the  reflected  earthshine  spectra. 
The  Firepond  sensor  was  ambiguous  between  pre-  and  postdepioyment  phases  due  to  the  similzurity 
of  the  canister  and  balloon  range-Doppler  images.  The  shape  difference  between  the  cylindriczd 
canister  and  the  cone-shaped  balloon  is  a  weak  feature  in  noise-corrupted  data.  Further  image 
processing,  such  as  intensity  averaging  and  smoothing,  may  enhance  the  radar  image-based  deci¬ 
sions  [37];  however,  in  the  sensor  fusion  e,  periment  preprocessing  was  limited  to  single  intensity 
threshold  and  centroid  operations.  The  Haystack  image  set  was  overall  the  least  ambiguous  due 
to  the  generation  of  complex  scenes.  During  deployment  the  radar  often  lost  reflections  from  the 
doors  and  became  ambiguous  between  predeployment  and  deployment  decisions. 

3.2  Firefly  Sensor  Fusion  System 

Figure  17  shows  the  distributed  sensor  fusion  system  used  to  analyze  the  Firefly  balloon 
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Ficnre  16.  Formulation  of  multisensor  data  fusion  for  balloon-canister  deployment: 
Haystack  and  Firepond  range-Doppler  images  and  passive-IR  simulation  of  predeploy¬ 
ment,  deployment,  and  postdeployment  phases. 
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Figure  17.  Distributed  sensor  fusion  .system  for  FFI  balloon-canister  deployment  de¬ 
tection:  Back  propagation  SNNs  for  passive-IR,  Haystack,  and  Fircpond  sensors  and 


BPFNN. 
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deployment  from  Haystack  and  Firepond  range-Doppler  images  and  the  passive-IR  simulation. 
Three  back  propagation  SNNs  were  trained  to  output  a  deployment  decision  based  only  on  the 
individual  sensor  data.  The  three  output  neurons,  corresponding  to  predeployment,  deployment, 
and  postdeployment  on  each  SNN,  output  an  analog  value  in  the  range  [0,1].  The  BPFNN  took 
the  normalized  SNN  outputs  as  input  and  mapped  to  an  overall  decision  beised  on  the  three  neuron 
SNN  outputs  for  each  of  the  three  deployment  phases.  The  SNN  and  FNN  output  targets  were 
(1,0,0)  for  predeployment,  (0,1,0)  for  deployment,  and  (0,0,1)  for  postdeploymeuo.  The  architecture 
in  Figure  17  implies  that  the  FNN  was  trained  to  perform  a  cluster  analysis  in  che  9D  space  of  SNN 
outputs.  The  FNN  inputs  were  clustered  around  ((1,0,0), (1,0,0), (1,0,0)),  ((0,1,0), (0,1,0), (0,1,0)), 
and  ((0,0,1), (0,0,1), (0,0,1))  for  predeployment,  deployment,  and  postdeployment,  respectively.  The 
SNNs  for  Haystack  and  Firepond  had  a  20-  x  200-pixel  input  plane,  a  4-  x  4-neuron  middle  layer, 
and  a  third  output  layer  with  3  neurons.  The  passive-IR  SNN  emd  the  FNN  had  16  neurons  in  the 
middle  layer,  3  neuron  outputs,  and  input  layers  of  20  and  9  neurons,  respectively.  The  radar  SNN 
structure  was  determined  in  part  by  the  computational  complexity  of  the  fully  interconnected  2D 
back  propagation  net  and  by  the  minimum  number  of  neurons  required  for  convergence  over  the 
training  set  of  images.  The  ID  nets  (one  SNN  and  the  FNN)  were  not  complexity-bound  so  that 
the  number  of  hidden  neurons  was  determined  by  convergence  issue.s  discussed  in  Section  2.2. 

The  Haystack  and  Firepond  SNNs  were  trained  on  3  to  4  images  each  from  predeployment, 
deployment,  and  postdeployrnent.  For  each  image  pair  the  aggregate  passive-IR  spectrum  was 
computed  based  on  the  objects  in  the  Haystack  images.  Training  each  radar  SNN  on  a  training 
set  of  about  12  images  using  the  back  propagation  learning  algorithm  required  about  30  min  on 
a  Silicon  Graphics  Workstation.  Upon  completing  SNN  training,  a  set  of  about  20  images  and 
passive-IR  spectra  eacli  from  the  three  deployment  phases  were  propagated  through  tlie  SNNs. 
The  normalized  SNN  outputs  formed  a  traini»ig  set  for  the  FNN.  It  should  be  emphasized  that  the 
training  set  for  the  FNN  must  reflect  the  uncertainty  in  decisions  from  each  sensor  alone.  This 
was  accomplished  by  using  an  FNN  training  set  distinct  from  the  SNN  training  data,  for  which  the 
performance  of  each  SNN  is  well-represented.  Thus,  for  example,  because  the  Firepond  pre-  and 
postdeployment  images  were  inherently  ambiguous,  the  FNN  training  set  contained  Firepond  SNN 
outputs  with  about  40%  error  in  pre-  and  postdeployment  detection.  This  procedure  was  necessary 
for  the  FNN  to  learn  the  extent  that  a  sensor  should  be  ignored  for  a  given  pattern  of  SNN  outputs. 
Figure  18  plots  the  FNN  cost  func  tion  C  versus  iteration  during  training.  The  ID  FNN  converged 
after  about  90  iterations  on  a  training  set  with  about  60  input  nine-vectors.  The  algorithm  ran  in 
approximately  20  s  on  the  Silicon  Graphics  Workstation. 

To  test  the  trained  sensor  fusion  system,  a  performance  set  was  created  that  contained  between 
10  and  JO  radar  image  pairs  each  from  predeployment,  deployment,  and  postdeployment  of  novel 
data  from  the  same  launch.  A  simulated  passive-IR  spectrum  was  generated  for  each  Haystcick 
image  with  added  random  Gaussian  noise  of  deviation  10%  of  the  peak  spectral  value.  The  images 
and  spectra  were  stacked  sequentially  in  time  and  propagated  through  the  sensor  fusion  system. 
Figure  19  shows  the  neuron  outputs  of  the  SNNs  over  the  performance  set.  Note  that  for  the 
passive-IR  SNN  the  deployment  and  postdeployment  neurons  oscillated  in  value,  reflecting  the 
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Figure  1 8.  Fusion  neural  net  cost  function  C  versm  training  iterations  for  sensor  fusion 
system  training  on  FFI  balloon-canister  deployment  detection. 


25 


179153-15 


lU 

is 

^  DC 
CL  3 
Ui  U1 

s'" 

</) 

2 


TIME 


SNN  OUTPUT 


SNN  OUTPUT 


HAYSTACK 
SNN  OUTPUT 


Figure  19.  SNN  neuron  outputs;  Novel  FFI  balloon.- canister  deployment  data  for  (a) 
predeployr  lent,  (b)  deployment,  and  (c)  postdeployment  neurons. 


ambiguity  due  to  the  domination  of  the  reflective  door  spectrum  by  the  grayhody  balloon.  The 
Firepond  SNN  neurons  oscillated  during  the  pre-  and  postdeployment  phases  due  to  the  similarity 
of  the  canister  and  balloon  range-Doppler  images.  Finally,  although  the  Hay.stack  radar  SNN  had 
the  best  performance  overall,  there  was  oscillation  during  the  deployment  phase  due  to  the  loss  of 
reflections  from  the  ejected  doors.  Figure  20  depicts  the  FNN  neuron  outputs  for  the  performance 
set,  which  clearly  indi  ates  a  performance  superior  to  any  of  the  SNNs.  This  is  the  desired  evidence 
of  sensor  synergism  c  itained  through  the  fusion  of  multisensor  data. 

A  procedure  similar  to  the  training  and  performance  tests  described  above  was  applied  to 
the  canister-payload  deployment  in  Figure  3.  In  this  case  the  training  set  was  generated  from 
the  FFI  launch,  and  the  system  performance  was  tested  on  data  from  the  FFII  launch.  Details 
of  the  analysis  will  not  be  described,  except  to  note  that  the  passive-IR  spectrum  was  dominated 
by  the  large  metallic  payload  (3  W /sr  peak)  in  the  predeployment  and  deployment  pha.‘  -s.  The 
radar  images  contained  only  the  payload  during  predeployment,  the  canister  and  payload  during 
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Figure  20.  FNN  neuron  outputs:  Novel  FFI  balloon-canister  deployment  data  for  (a) 
predeployment,  (b)  deployment,  and  (c)  postdeploy), lent  neurons. 


deployment,  and  the  canister  alone  during  postdeployment.  The  radar  SNNs,  therefore,  detected 
the  deployment  phases  based  on  image  segmentation  and  payload-canister  size  differences.  The 
three  neuron  output  values  for  each  of  the  SNNs  from  a  performance  set  of  about  60  FFII  images 
of  the  canister  deployment  is  shown  in  Figure  21.  The  Firepond  SNN  performance  was  poor  due  to 
the  lack  of  correct  scaling  for  FFII  and  a  high  clutter  level  in  the  data.  The  difficulties  in  laimch- 
to-launch  cross-range  scaling  resulted  from  different  object  spin  rates  between  FFI  and  FFII.  For 
the  most  part,  the  problem  can  be  corrected  by  further  postlaunch  image  processing.  Figure  22 
illustrates  the  three  FNN  neuron  outputs  for  the  performance  set  of  FFII  data.  As  with  the  ballnon 
deployment  results  in  Figures  19  and  20,  there  is  clear  evidence  of  sensor  synergism  from  the  distinct 
FNN  neuron  outputs  during  the  different  phases. 
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Figure  21.  SNN  neuron  outputs:  Novel  FFII  canister-payload  deployment  data  for  (a) 
predeployment,  (b)  deployment,  and  (>')  postdeployment  neurons. 
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Figure  22.  FNN  nci'ron  outputs:  Novel  FFIl  canister-payload  deployment  data  for  (a) 
predeployment,  (b)  deployment,  and  (c)  postdeployment  neurons. 
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4.  CONCLUSION 


This  report  describes  the  theoretical  and  experimental  analysis  of  neural  networks  in  a  dis¬ 
tributed  sensor  fusion  decision-making  environment.  The  architecture  consists  of  sensor-level  de¬ 
cision  nodes,  which  output  a  decision  based  on  data  from  a  particular  sensor.  The  raultisensor 
decision  outputs  form  the  input  to  a  fusion  decision  node  for  an  overall  decision.  The  fusion  node 
performs  cluster  analysis  in  the  multisensor  decision  hypothesis  space  to  obtain  the  system  decision. 

The  theoretical  analysis  consisted  of  the  application  of  neural  nets  to  a  benchmark  problem, 
the  detection  of  variance  transitions  in  Gaussian  noise,  for  which  a  classical  hypothesis  test  is  de¬ 
fined.  In  both  the  cases  of  stand-alone  single  sensor  decision  making  and  multisensor  fusion,  the 
neural  nets  matched  the  performance  at  the  classical  optimum.  In  general,  the  optimum  fusion  pro¬ 
cessor,  which  is  obtained  from  a  log-likelihood  test  in  Chair  and  Varshney  [3],  is  a  perceptron  neural 
net.  This  fact  motivated  the  use  of  an  adaptive  network  at  the  fusion  processor  in  the  distributed 
■  ensor  fusion  architecture.  It  was  shown  that  a  back  propagation  net  matched  the  performance  of 
the  optimum  fusion  processor  on  the  variance  transition  detection  (SXOR)  test.  The  procedure  of 
net  training  in  the  distributed  sensor  architecture,  which  requires  separate  representative  training 
sets  for  the  sensor  and  fusion  nodes,  was  reviewed  in  its  application  to  the  SXOR  test. 

The  experimental  analysis  of  neural  net  sensor  fusion  consisted  of  applying  the  system  to 
object  deployment  detection  during  the  Firefly  launch.  Tlie  sensor  inputs  consisted  of  range- 
Doppler  images  from  the  Haystack  (X-band)  and  Firepond  (CO2  laser)  radars,  as  well  as  a  passive- 
IR  spectral  simulation  of  the  tracked  objects.  The  output  decisions  were  the  identification  of 
predeployraent,  deploy i.uevit,  and  postdeployment  phases  for  the  release  of  an  inflatable  carbon 
cloth  balloon.  The  fusion  neural  net  performed  a  9D  cluster  analysis  (three  sensors  with  three 
decisions)  on  the  output  of  independently  trained  sensor  neural  nets.  The  system  was  trained  and 
performance-tested  on  data  from  the  first  Firefly  launch  for  the  detection  of  balloon  deployment. 
In  a  more  recent  experiment  the  system  was  applied  to  the  detection  of  canister  deployment  using 
training  and  performance  data  from  the  first  and  second  Firefly  launches,  respectively.  The  results 
clearly  demonstrate  enhanced  fusion  performance  from  the  comparison  of  deployment  detection  by 
the  fusion  and  sensor  nets.  Through  the  analysis  of  sensor  ambiguities,  it  was  shown  that  the  fusion 
system  employs  synergism  between  the  various  sensors  to  provide  an  optimum  overall  decision. 

Distributed  sensor  fusion  processing  is  a  highly  relevant  procedure  for  data-based  decision 
making.  The  architecture  in  Figure  1  has  built-in  robustness  against  communication  failure  by 
allowing  decision  making  at  each  sensor  processor.  The  system  is  also  robust  against  single  sensor 
failure  through  the  fusion  of  multiple  sensor  decisions.  This  report  demonstrates  that  the  appli¬ 
cation  of  neural  nets  in  the  architecture  takes  full  advantage  of  performance  enhancements  made 
possible  by  data  fusion. 
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APPENDIX  A 

VARIANCE  TRANSITION  DETECTION 


Equations  (3)  and  (4)  are  derived  to  relate  detection  and  false  alarm  probabilities  to  the 
quantities  {p(jlTn)|j,  me{0, 1}}.  Recall  that  the  indexes  zero  and  one  correspond  to  noise  deviations 
era  and  tri,  respectively.  The  pair  (t,  j)  denotes  a  transition  from  deviation  Cj  to  deviation  Cj,  and  the 
expression  pix\y)  denotes  the  probability  of  x  detection  conditioned  on  y.  The  relevant  probabilities 
are  then  given  by  Pd  —  p(transition|transition)  and  Py  =  p(transitionlno  transition)  for  detection 
and  false  2ilarm.  The  detection  probability  is  given  by 


p(transition|transition)  =  p((l,  O)|transition)  +  p((0, 1)|  transition).  (A.l) 


The  application  of  Bayes  theor-.  ra  to  Equation  (11)  yields  the  result 


Pd  = 


p((l,  0),  transition)  +  p((0, 1),  transition) 
p(transition) 


(A.2) 


where  p(transition)  represents  the  prior  probability  of  a  transition,  which  is  obtained  either  by  a 
(1,0)  or  a  (0,1)  noise  deviation  pair.  Equation  (12)  can  be  written  in  terms  of  the  probability  for 
specific  deviation  pair  detection  with  the  result 

_  p((0,l),(l,0))+p((0,l),(0,l))+p((l,0),(l,0))+p((l,0),(0,l))  ,, 

^  p((l,0))  +  p((0,l))  ’  ^ 


where  p{{i,j))  represents  the  prior  probability  of  a  deviation  pair  {i,j).  Application  of  Bayes 
theorem  to  Equation  (13)  results  in  the  expression 

p  _  [p((0, 1)1(1, 0))+p((l, 0)1(1, 0))]p((l,C)) 

p((l,0))+p((0,l)) 

[p((0, 1)1(0, 1))+P((1, 0)1(0,  l))]p((0,l)) 
p((l,0))+p((0,l))  • 

Recall  that  p((i,  j)i(fc,  m.))  represents  the  detection  of  deviation  pair  {i,j)  conditioned  on  the  pair 
{k,m).  Assuming  that  the  decision  for  this  occurrence  is  based  on  a  pair  of  maocimum  likelihood 
tests  before  and  after  the  transition,  the  conditional  probabilities  factorize,  that  is,  p((i,  j)\{k,  m))  — 
p(z|fc)p(j|m).  Application  of  this  property  in  Equation  (14)  results  in  the  expression 


Pd-p(i|i)p(0|0)+p(0!i)p(ii0), 


(A.5) 
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where  p{i\j)  is  given  in  Equations  (5)  and  (6).  It  is  interesting  that  the  prior  probabilities  p({i,j)) 
have  cancelled  from  Equation  (15),  indicating  an  overall  detection  probibility  independent  of  the 
prior  distribution  of  deviation  pairs. 

The  same  argument  applied  to  the  false  alarm  probability  results  in  the  expression 


p(l|l)p(0|l)p((l,  1))  +  p(l|0)p(0i0)p((0, 0)) 
[p((0,0))+p((l,l))]/2 


(A.6) 


In  this  case  the  probability  depends  on  the  prior  probabilities  p((0, 0))  and  p((l,  1))  for  the  ensemble 
upon  which  the  hypothesis  test  is  applied.  An  ensemble  in  which  all  deviation  pairs  (t,y)  are  equally 
likely  results  in 


P/=p(l|l)p(0|l)  +  p(0|0)p(110). 


(A.7) 
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